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Abstract This is the first report of a full genome scan of sexual orientation in men. A 
sample of 456 individuals from 146 families with two or more gay brothers was 
genotyped with 403 microsatellite markers at 10-cM intervals. Given that previously 
reported evidence of maternal loading of transmission of sexual orientation could indicate 
epigenetic factors acting on autosomal genes, maximum likelihood estimations (mlod) 
scores were calculated separated for maternal, paternal, and combined transmission. The 
highest mlod score was 3.45 at a position near D7S798 in 7q36 with approximately 
equivalent maternal and paternal contributions. The second highest mlod score of 1.96 
was located near D8S505 in 8pl2, again with equal maternal and paternal contributions. 

A maternal origin effect was found near marker D10S217 in 10q26, with a mlod score of 
1.81 for maternal meioses and no paternal contribution. We did not find linkage to Xq28 
in the full sample, but given the previously reported evidence of linkage in this region, 
we conducted supplemental analyses to clarify these findings. First, we re-analyzed our 
previously reported data and found a mlod of 6.47. We then re-analyzed our current data, 
after limiting the sample to those families previously reported, and found a mlod of 1.99. 



These Xq28 findings are discussed in detail. The results of this first genome screen for 
normal variation in the behavioral trait of sexual orientation in males should encourage 
efforts to replicate these findings in new samples with denser linkage maps in the 
suggested regions. 

Brian S. Mustanski and Michael G. DuPree contributed equally to this work. 


Introduction 

Although most males report primarily heterosexual attractions, a significant minority 
(approximately 2%-6%) of males report predominantly homosexual attractions 
(Diamond 1993 : Laumann et al. 1994 : Wellings et al. 1994 ). Multiple lines of evidence 
suggest that biological factors play a role in explaining individual differences in male 
sexual orientation (MIM 306995). For example, the third interstitial nuclei of the human 
anterior hypothalamus (INAFI3), which is significantly smaller in females, is also 
reported to be smaller in homosexual males (LeVay 1991 ). Byne and colleagues ( 2001) 
followed up on this finding by reporting a trend for INAH3 to occupy a smaller volume 
in homosexual men than in heterosexual men, with no significant difference in the 
number of neurons within the nucleus. Neuropsychological studies have reported 
differences in performance with respect to tasks that show sex differences, such as spatial 
processing (e.g., Rahman and Wilson 2003 ). which may indicate differences in relevant 
neural correlates (e.g., parietal cortex). The strong link between adult sexual orientation 
and childhood gender-related traits expressed at an early age (Bailey and Zucker 1995) 
suggests that such biological influences act early in development, possibly prenatally. 
Similarly, the correlation between sexual orientation and a variety of prenatally canalized 
anthropometric traits suggests that sexual orientation differentiation probably occurs 
before birth (for a review, see Mustanski et al. 2002 ). Despite this evidence, specific 
neurodevelopmental pathways have yet to be elucidated. 

Family and twin studies have provided evidence for a genetic component to male sexual 
orientation. Family studies, using a variety of ascertainment strategies, document an 
elevation in the rate of homosexuality among relatives of homosexual probands (for a 
review, see Bailey and Pillard 1995 ). Several family studies report evidence of increased 
maternal transmission of male homosexuality (Hamer et al. 1993 : Rice et al. 1999a ). 
whereas others find no increase relative to paternal transmission (Bailey et al. 1999 : 
McKnight and Malcolm 2000 ). Twin studies consistently show that male sexual 
orientation is moderately heritable (for a review, see Mustanski et al. 2002 ). For example, 
two recent twin studies in population-based samples both report moderate heritability 
estimates, with the remaining variance being explained by nonshared environmental 
influences (Kendler et al. 2000 : Kirk et al. 2000 ). The results from family and twin 
studies demonstrate that sexual orientation is a complex (i.e., does not show simple 
Medelian inheritance) and multifactorial phenotype. 

A more limited number of studies have attempted to map specific genes contributing to 
variation in sexual orientation. Given the evidence for increased maternal transmission, 
initial efforts focused on the X chromosome. One study produced evidence of significant 
linkage, based on Lander and Kruglyak ( 1995) criteria, to markers on Xq28 (Hamer et al. 




















1993 ). Another study, from the same laboratory but with a new sample, reported a 
significant replication of these findings (Hu et al. 1995 ). An independent group produced 
inconclusive results regarding linkage to Xq28 (discussed in Sanders and Dawood 2003) 
but did not publish the findings in a peer-reviewed journal. All three of these studies 
excluded families showing evidence for non-maternal transmission. A fourth study from 
another independent group found no support for linkage, even when excluding cases with 
suggestive father-to-son transmission (Rice et al. 1999b ). An analysis of the results across 
all four studies produced a statistically suggestive multiple scan probability (MSP) value 
of 0.00003 (Sanders and Dawood 2003 ). Two candidate gene studies have been 
conducted, both producing null results: one for the androgen receptor (AR; Macke et al. 
1993 ) and another for aromatase (CYP19A1; Dupree et al. 2004 ), onXql2 and 15q21.2, 
respectively. 

Given the complexity of sexual orientation, numerous genes are likely to be involved, 
many of which are expected to be autosomal rather than sex-linked. Indeed, the modest 
levels of linkage that have been reported for the X chromosome can account for, at most, 
only a fraction of the overall heritability of male sexual orientation as deduced from twin 
studies. Therefore, we have undertaken a genomewide linkage scan to aid in the 
identification of genes contributing to variation in sexual orientation. As in previous 
studies, we diminished the probability of false positives (i.e., gay men who identify as 
heterosexual) by only studying self-identified gay men. Unlike previous studies that have 
focused solely on the X-chromosome and thus excluded families showing evidence of 
non-maternal transmission, this study did not use transmission pattern as an exclusion 
criteria. To consider the possibility that previously reported evidence of maternal loading 
of transmission of sexual orientation was attributable to epigenetic factors acting on 
autosomal genes, we calculated maximum likelihood estimations (mlod) scores separated 
by maternal or paternal transmission and the combined statistic. Based on Lander and 
KruglyakDs ( 1995 ) criteria, we found one region of near significance and two regions 
close to the criteria for suggestive linkage. 


Materials and methods 

Family ascertainment and assessment 

The sample consisted of a total of 456 individuals from 146 unrelated families, of which 
137 families had two gay brothers and 9 families had three gay brothers. Thirty of the 
families included one parent, and 30 of the families included both parents. Additionally, 
46 of the families included at least one heterosexual male or female full sibling (up to 6 
additional siblings per family). The sample included 40 families previously reported by 
Hamer et al. ( 1993 ), 33 families previously reported by Hu et al. ( 1995 ), and 73 
previously unreported families. The 73 previously described families were selected for 
the presence of two gay brothers with no indication of non-maternal transmission by the 
criteria described previously (Hamer et al. 1993 : Hu et al. 1995 ). For the 73 new families, 
the sole inclusion criterion was the presence of at least two self-acknowledged gay male 
siblings. 















Subjects were recruited through advertisements in local and national homophile 
publications as described elsewhere (Hamer et al. 1993 : Hu et al. 1995 ). The participants 
were predominantly white (94.5%), college educated (87.4%), and of middle to upper 
socioeconomic status. The mean (SD) age for the gay siblings was 36.98 (8.64). The 
protocol was approved by the NCI Institutional Review Board, and each participant 
signed an informed consent form prior to interview, questionnaire completion, and the 
donation of blood for DNA extraction. 

Sexual orientation was assessed through a structured interview or a questionnaire that 
included a sexual history and the Kinsey scales of sexual attraction, fantasy, behavior, 
and self-identification (Kinsey et al. 1948 ). Each scale ranges from 0 (exclusively 
heterosexual) to 6 (exclusively homosexual). The mean (SD) of these four scales for the 
gay males in this study was 5.65 (0.46) 

Genotyping 

DNA was extracted from peripheral blood by a commercial service (Genetic Design, 
Greensboro, N.C., USA). A multiplex polymerase chain reaction (PCR) was conducted as 
described (Dupree et al. 2004 ), with 403 microsatellite markers from the ABI PRISM 
Linkage Mapping Set Version 2.5 with an average resolution of 10 cM. Following the 
manufacturerOs guidelines, products were analyzed on an ABI Prism 310 or 3100 and 
sized with the GeneScan version 3.1.2 program (PE Biosystems, Foster City, Calif., 

USA), and genotypes were assigned with the Genotyper version 3.6 program (PE 
Biosystems). A PCR product from a DNA reference sample (CEPH 1347-02) was used to 
monitor sizing conformity (PE Biosystems). Across the 403 markers, genotypes were 
ascertained on average for 95% of the 456 individuals. Mendelian incompatibilities 
(<0.05% of genotypes) were removed from the data prior to analyses by using the 
sib clean routine from ASPEX version 2.4 (Hinds and Risch 1996 ). The computer 
program CERVUS 2.0 (Marshall et al. 1998 ) was employed to test for deviation from the 
Hardy-Weinberg equilibrium (HW) and to calculate polymorphism information contents 
(PICs) at all loci. We found that the markers had a mean (SD) PIC of 0.76 (0.08), and 
1.31% of the markers deviated significantly from HW. 

Statistical analyses 

Nonparametric exclusion mapping of affected sib-pair data (ASP) was performed by 
using ASPEX version 2.4 (Hinds and Risch 1996 ). ASPEX calculates the percentage of 
identical by descent (%IBD) sharing and reports the proportion of shared alleles of 
paternal, maternal, and combined origin. The results for alleles of combined origin also 
include alleles where the parental origin is unknown. We calculated mlod with a linear 
model and assuming a multiplicative model. The ASPEX SIB PHASE algorithm was 
applied; this uses allele frequency information to reconstruct and to phase missing 
parental information. Sex-specific recombination maps were used for the calculation of 
multipoint mlod scores. Marker order and map positions were determined by using an 
integrated map (Nievergelt et al. 2004 ) based on the deCODE genetic map and updated 
physical map information. 



Results 

Results from the multipoint analyses on chromosomes 1 through 22 are shown in Fig. I 
for paternal, maternal, and combined meioses. Our complete genome scan for male 
sexual orientation yielded three interesting peaks with mlod scores greater than 1.8, 
located on chromosomes 7, 8, and 10. Table J_ contains additional information concerning 
these peaks, including the nearest marker, the location, MLOD, and allele sharing. 
Additionally, Table I contains the approximate boundary of the linkage peak, by 
reporting the approximate cM position at which the mlod score declines below 1.0. For 
chromosomes 7 and 8, the peak is a result of approximately equal contributions from 
maternal and paternal transmission, whereas a maternal-origin effect was found for the 
peak on chromosome 10. 



LOD Score ° LOD Score LOD Score 





is the mlod score. Graphics included for combined (a), maternal (b), and paternal (c) 
meioses 





















Table 1 Chromosomal locations with nominally significant linkage peaks. The cM 
positions in parentheses indicate the boundary at which the mlod score declines below 
1.0. For chromosomes 7 and 8, the position is based on the combined map, but for 
chromosome 10, the position is based on the female map. 


Nearby 

marker 

Location 

mlod 

Percentage of 
sharing 

cM 

Cyto 

Paternal 

Maternal 

Combined 

D7S798 

169.9 (155.1- 
end) 

7q36 

2.05 

2.26 

3.45 

62.59 

D8S505 

54.2 (45.1- 
64.8) 

8p 12 

1.38 

0.93 

1.96 

60.10 

D10S217 

208.1 (201.8- 
217.4) 

10q26 

-0.13 

1.89 

1.43 

58.51 


Figure 2 shows the multipoint mlod plots for the X chromosome. Analyses of the full 
sample (dashed line) did not produce any chromosomal regions with mlod scores greater 
than 1.0. Given the previous evidence of linkage to Xq28 with a portion of the sample 
reported here (Hamer et al. 1993 : Hu et al. 1995), we performed supplemental analyses to 
determine why we did not find linkage in the full sample. We began by re-analyzing the 
data from the previously reported 73 families, which had been selected for showing no 
evidence of paternal transmission, by using updated marker positions (dotted line). This 
produced a maximum mlod score of 6.47 for markers in the Xq28 region. We then 
performed a linkage analysis, with only the markers from the ABI linkage mapping set, 
on these same 73 families. This produced a maximum mlod score of 1.99 for markers in 
the Xq28 region. Although the mlod score is higher when using the current markers in the 
limited sample compared with the full sample (1.99 vs. 0.35), it is still significantly lower 
than the previously reported markers in the limited sample. We provide Table 2 in order 
to help clarify these results. Table 2 provides singlepoint and multipoint results for the 73 
previously reported families on all markers ever reported from our group, starting with 
the most telemeric new Xq28 marker. Table 2 makes it clear that, although the multipoint 
results suggest a dramatic change in mlod score between the current markers and the 
previously reported markers (6.47 vs. 1.99 for markers 0.62 cM apart), the singlepoint 
results are not dramatically different (2.23 vs. 1.47). This difference is likely to be 
attributable to two factors. First, the previous reports focused on the X chromosome and 
contained many more markers in the Xq28 region; the previously reported markers had 
an average resolution of 1 marker every 1.12 cM, whereas the current markers had an 
average resolution 6.97 cM in the Xq28 region. The higher concentration of previously 
reported markers surely allowed for the extraction of more multipoint linkage 
information. Second, there were more telomeric markers in the previously reported 
mapping sets than in the current one. The singlepoint results showed a trend for higher 
mlod scores closer to the telomere, with the exception of JXYQ28, which had a low PIC 
(0.28). 
































Map Position 

Fig. 2 Multipoint linkage analysis for the X chromosome. The x-axis is the chromosome 

location ( cM ), and the y-axis is the mlod score.-Current markers with sample 

restricted to previously reported families.-Current markers with full sample. 

Previously reported markers with previously reported families 


Table 2 Supplemental analyses comparing Xq28 results across markers reported on in 
1995, 1997, and the current report. All analyses reported here are based on the sample 
restricted to those families previously reported. Current markers and previously reported 
markers were analyzed separately for the purpose of calculating multipoint mlod scores. 


Marker 

Study 

year 

Location 

(cM) 

Marker 

distance 

(cM) 

Multipoint 

mlod 

(previous 

markers) 

Multipoint 
mlod (current 
markers) 

Singlepoint 

mlod 

DXS1073 

Current 

188.22 



1.99 

1.47 

F8C 

1993 

188.84 

0.62 

6.47 

1 

2.23 

DXS1108 

1993 

190.32 

1.47 

6.27 

1 

4.22 

JXYQ28 

1995 

190.47 

0.15 

6.28 

1 

0.48 

DXYS154 

1993 

190.79 

0.32 

5.71 

1 

3.53 


Discussion 


This study reports results from the first full genome scan for male sexual orientation. 
Using 73 previously reported families and 73 new families with two or more gay male 
siblings, we found three new regions of genetic interest. Our strongest finding was on 
7q36 with a combined mlod score of 3.45 and equal contribution from maternal and 
paternal allele transmission. This score falls just short of Lander and KruglyakDs ( 1995 ) 





































criteria for genomewide significance. Several interesting candidate genes map to this 
region of chromosome 7. Vasoactive intestinal peptide (VIP) receptor type 2 ( VIPR2; 
MIM 601970) is a G protein-coupled receptor that activates adenylate cyclase in response 
to VIP (Metwali et al. 1996 ), which functions as a neurotransmitter and as a 
neuroendocrine hormone. VIPR2 is essential for the development of the hypothalamic 
suprachiasmatic nucleus in mice (Harmar et al. 2002 ), which makes it an interesting 
candidate gene for sexual orientation in view of earlier reports of an enlarged 
suprachiasmatic nucleus in homosexual men (Swaab and Hofman 1990 ). Sonic hedgehog 
(.SHH ; MIM 600725) plays an essential role in patterning the early embryo, including 
hemisphere separation (Roessler et al. 1996 ) and left to right asymmetry (Tsukui et al. 
1999 ). Homosexual men and women show a significant increase in non-righthandedness, 
which is related to brain asymmetry (Lalumiere et al. 2000 ). 

Two additional regions approached the criteria for suggestive linkage. The region near 
8pl2 contains several interesting candidate genes, given the hypothesized relationship 
between prenatal hormones and sexual orientation (Mustanski et al. 2002 ). Gonadotropin¬ 
releasing hormone 1 ( GNRH1 ; MIM 152760) stimulates both the synthesis and release of 
luteinizing hormone and follicle-stimulating hormone, which are important regulators of 
steroidogenesis in the gonads, and inhibits the release of prolactin (Adehnan et al. 1986) . 
GnRH is synthesized in the arcuate nucleus and other nuclei of the hypothalamus 
(Kawakami et al. 1975 ). Steroidogenic acute regulatory protein (STAR; MIM 600617) 
mediates pregnenolone synthesis and is involved in the hypothalamic-pituitary regulation 
of adrenal steroid production (Sugawara et al. 1995 ), which in turn plays an important 
role in sexual development. Neuregulinl (NRG1; MIM 142445) produces a variety of 
iso forms that regulate the growth and differentiation of neuronal and glial cells through 
interaction with ERBB receptors (Burden and Yarden 1997 ; Wen et al. 1994 ). 

The 10q26 region is of special interest because it results from excess sharing of maternal 
but not paternal alleles. Previous studies have suggested that there is an excess of 
homosexual family members related to the proband through the mother, and we have 
proposed previously that this might result in part from genomic imprinting (Bocklandt 
and Hamer 2003 ). In support of a connection between 10q26 and imprinting, a germline 
differentially methylated region has been identified at this location by Strichman- 
Almashanu et al. ( 2002 ) who performed a genomewide screen for normally methylated 
CpG islands and found 12 regions to be differentially methylated in uniparental tissues of 
germline origin, i.e., hydatidiform moles (paternal origin) and complete ovarian 
teratomas (maternal origin). Such CpG islands can regulate the expression of imprinted 
genes over distances of several hundred kilobases. The region around the 10q26 CpG 
islands includes the brain-expressed gene Shadow of Prion Protein (SPRN), several 
transcription regulators (ZNF511, VENTX2; MIM 607158), neurotransmitter interacting 
proteins (DRD1IP; MIM 604647), and cell signaling pathway proteins (INPP5A; MIM 
600106, GPR123). 

Four previous linkage studies have been conducted on the X chromosome and together 
produce a statistically suggestive MSP in the Xq28 region (Sanders and Dawood 2003 ). 
Because the focus of this study was a full genome scan with the ABI linkage mapping set 



on a partially new set of families, we began by reporting results for these markers on the 
full sample. This analysis did not produce evidence of linkage in the Xq28 region; 
therefore, we conducted supplemental analyses to clarify this result given previous 
findings. Our first supplemental analysis combined results from the two previous reports 
from our group (Hamer et al. 1993 ; Hu et al. 1995 ) in order to determine the magnitude 
of the linkage signal in the 73 previously reported families that currently comprised half 
of the current sample. This produced a mlod of 6.47. To determine whether the lack of 
linkage evidence in the full sample was attributable to the new markers or the additional 
families (who were not selected based on family transmission patterns), we then 
conducted analyses on the previously reported families by using the markers from the 
ABI linkage mapping set. This produced an mlod score of 1.99. Table 2, which provides 
a summary of the single point and multipoint results for this comparison, suggests that 
that the difference in mlod score between the restricted sample with the old and new 
markers is attributable to the non-optimal position and density of the new markers. The 
difference in mlod scores between the full sample and the sample restricted to families 
without evidence of paternal transmission (with the goal of enriching the sample for 
families showing maternal transmission) denotes the possibility of etiologic heterogeneity 
for the proposed Xq28 locus. 

Several limitations of the current study should be noted. First, we were unable to 
calculate empirically derived significance levels for this project because none of the 
simulation programs that currently exist allow for the use of sex-specific maps with ASP 
data. Future development of simulation programs that allow for the incorporation of this 
important information will prevent this limitation in the future. Second, our marker set 
had an average resolution of 10 cM, which may have led to underestimated mlod scores. 
We discuss in detail above the likely negative effects that this had on our X chromosome 
results. Optimally, genome scans are followed up with dense markers placed in promising 
regions, but because of financial limitations, we were unable to do this. Future studies 
will undoubtedly employ more sophisticated and dense marker sets. Third, we analyzed 
only 146 independent families, which is a small sample for a complex trait such as sexual 
orientation. Approximately half of these families have previously been included in 
reports on the X chromosome (Hamer et al. 1993 : Hu et al. 1995 ). Future research should 
be conducted on a new and larger sample of participants. Our linkage results should be 
interpreted with consideration of the fact that we only included families with two self- 
identified gay brothers. Our results may not extrapolate to individuals who do not meet 
our exclusion criteria, such as individuals who engage in same-sex behavior but do not 
identify as gay or individuals who identify as bisexual. The definition of homosexuality is 
complicated, and future genetic research would benefit from additional phenotype 
development or the identification of endophenotypes for sexual orientation (Mustanski et 
al. 2002 ). The identification of basic processes that underlie sexual orientation could 
increase the power of future genetic studies. A related limitation is that we did not 
include females in our study because it is not yet clear if female sexual orientation is 
determined by the same factors as male sexual orientation (for a discussion, see 
Mustanski et al. 2002 ). Future research with mix-sexed samples should help to answer 
this question. Finally, we did not collect data on the number of older brothers, which 
shows a robust association with male sexual orientation (Blanchard 2004 ). Future studies 



should collect this data to allow for explorations of gene by environment interactions; this 
could increase the ability to identify genetic loci and also help to elucidate the process 
linking number of older brothers to sexual orientation. 

In summary, we report the first genome scan for loci involved in the complex phenotype 
of male sexual orientation. We have also identified several chromosomal regions and 
candidate genes for future exploration. The molecular analysis of genes involved in 
sexual orientation could greatly advance our understanding of human variation, 
evolution, and brain development. In the absence of obvious animal models, genetic 
linkage and association studies provide the best opportunity for discovering these loci. 
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